Jaiden Brown
04/26/2023
https://github.com/fivethirtyeight/negro-leagues-player-ratings
The github repository with the dataset, this analyses will explain the story and stats of many forgotten baseball stars.
Barrier of entry:
Negro league: 150 games as a batter or 60 games + starts as a pitcher
MLB: 300 games as a batter or 350 games + starts as a pitcher
## Rows: 1,117
## Columns: 25
## $ playerID <chr> "culbech01", "gosseph01", "herrmch01", "kratzer01", "pire…
## $ commonName <chr> "Charlie Culberson", "Phil Gosselin", "Chris Herrmann", "…
## $ league <chr> "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "…
## $ hof <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ startYear <dbl> 2012, 2013, 2012, 2010, 2014, 2015, 2011, 2014, 2008, 201…
## $ endYear <dbl> 2020, 2020, 2019, 2020, 2019, 2019, 2019, 2019, 2019, 202…
## $ totalGames <dbl> 428, 359, 370, 335, 302, 326, 461, 419, 386, 313, 376, 48…
## $ positionWar <dbl> -0.620, 0.895, -1.150, 1.715, 0.545, 1.310, -1.555, 4.340…
## $ averageHit <dbl> 41.791451, 72.992105, 3.648244, 21.236047, 67.574190, 10.…
## $ patience <dbl> 13.776205, 28.641438, 70.106180, 19.112442, 18.976314, 24…
## $ power <dbl> 41.709774, 16.879935, 44.105636, 69.670569, 37.244759, 9.…
## $ speed <dbl> 64.524912, 58.562483, 75.850803, 1.334059, 78.872856, 81.…
## $ defense <dbl> 24.25810, 44.89518, 36.48244, 99.59161, 38.95998, 90.4982…
## $ gameCutoff <dbl> 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 30…
## $ playerLabel <chr> "Active Player", "Active Player", "Active Player", "Activ…
## $ shortWar <dbl> -0.2346729, 0.4038719, -0.5035135, 0.8293433, 0.2923510, …
## $ positionCat <chr> "Outfielder", "Middle IF", "Catcher", "Catcher", "Middle …
## $ position <chr> "Batter", "Batter", "Batter", "Batter", "Batter", "Batter…
## $ careerStarts <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strikeOuts <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ control <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ fip <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ whip <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ era <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ fact <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NLBandMLB <- RawNLBandMLB %>% select(playerID, commonName, league, hof, startYear, endYear, totalGames, positionWar, averageHit, defense, gameCutoff, playerLabel, shortWar, positionCat, position, era)
NLB <- NLBandMLB %>% filter(league == 'NLB')
MLB <- NLBandMLB %>% filter(league == 'MLB')
glimpse(NLBandMLB)## Rows: 1,117
## Columns: 16
## $ playerID <chr> "culbech01", "gosseph01", "herrmch01", "kratzer01", "pirel…
## $ commonName <chr> "Charlie Culberson", "Phil Gosselin", "Chris Herrmann", "E…
## $ league <chr> "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "MLB", "M…
## $ hof <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ startYear <dbl> 2012, 2013, 2012, 2010, 2014, 2015, 2011, 2014, 2008, 2018…
## $ endYear <dbl> 2020, 2020, 2019, 2020, 2019, 2019, 2019, 2019, 2019, 2020…
## $ totalGames <dbl> 428, 359, 370, 335, 302, 326, 461, 419, 386, 313, 376, 489…
## $ positionWar <dbl> -0.620, 0.895, -1.150, 1.715, 0.545, 1.310, -1.555, 4.340,…
## $ averageHit <dbl> 41.791451, 72.992105, 3.648244, 21.236047, 67.574190, 10.8…
## $ defense <dbl> 24.25810, 44.89518, 36.48244, 99.59161, 38.95998, 90.49823…
## $ gameCutoff <dbl> 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300…
## $ playerLabel <chr> "Active Player", "Active Player", "Active Player", "Active…
## $ shortWar <dbl> -0.2346729, 0.4038719, -0.5035135, 0.8293433, 0.2923510, 0…
## $ positionCat <chr> "Outfielder", "Middle IF", "Catcher", "Catcher", "Middle I…
## $ position <chr> "Batter", "Batter", "Batter", "Batter", "Batter", "Batter"…
## $ era <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
duplicatedData <- inner_join(x = NLB, y = MLB, by = "commonName") %>% select("commonName")
PlayersInBothLeagues <- inner_join(NLBandMLB, duplicatedData, by = "commonName")
ggplot(PlayersInBothLeagues, mapping = aes(shortWar, commonName, color = league)) +
geom_point()Are the distribution of WAR similar across both leagues?
Conclusion: The talent in both leagues are comparable
Who are the superstars in the NLB?
## # A tibble: 20 × 5
## commonName league averageHit shortWar hof
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Ty Cobb MLB 100 8.02 1
## 2 Charlie Smith NLB 100 10.3 0
## 3 Nap Lajoie MLB 99.9 7.09 1
## 4 Ed Delahanty MLB 99.9 7.97 1
## 5 Ted Williams MLB 99.9 8.92 1
## 6 Rogers Hornsby MLB 99.9 9.23 1
## 7 Tris Speaker MLB 99.8 7.70 1
## 8 Rod Carew MLB 99.8 5.05 1
## 9 Tony Gwynn MLB 99.7 4.45 1
## 10 Josh Gibson NLB 99.7 10.9 1
## 11 George Sisler MLB 99.7 4.18 1
## 12 Wade Boggs MLB 99.6 5.96 1
## 13 Honus Wagner MLB 99.6 8.25 1
## 14 Stan Musial MLB 99.6 6.83 1
## 15 Harry Heilmann MLB 99.5 5.31 1
## 16 Roberto Clemente MLB 99.5 5.84 1
## 17 Eddie Collins MLB 99.4 7.00 1
## 18 Heavy Johnson NLB 99.4 5.42 0
## 19 Jose Altuve MLB 99.4 4.50 0
## 20 Babe Ruth MLB 99.3 11.1 1
After looking and comparing the data I believe it was right for the MLB to recognize and add the stats of many of these players to the MLB as they had very similar competition and many that came out of the NLB was able to produce as similar if not higher levels in the MLB then they did while in the NLB